Cleaning the Data in R
1.) Read in all four data sets into R.
Example:
income <-
read.csv(“/Users/kyleknox/Documents/M.S. Applied Statistics/SPRING
2024/STA533/week5/income_per_person.csv”, check.names = FALSE)
2.) Create new datasets that are transposed and
oriented correctly.
Example:
incomepp <- income %>%
gather(key = “year”, value = “income”,
-geo) %>%
rename(country = geo)
3.) Create a
new data set using an inner join merging the income_per_person.csv with
the life_expectancy_years.csv dataset.
Example:
LifeExpIncom <- inner_join(incomepp, lifeyr, by
= c(“country”, “year”))
4.)Rename all variables in the
countries_total.csv and population_total to match the variables in
LifeExpIncome.
Example:
country_year <- countryR %>% rename(country = name) country_region
<- country_year %>% select(country, region)
5.)
Create a new data set using an inner join merging the new data set
LifeExpIncome with the country_year.csv dataset.
Example:
LifeIncomeRegion <-
inner_join(LifeExpIncom, country_region, by = “country”)
6.) Create a final data set using an inner join merging the new data set
LifeIncomeRegion with the new population dataset.
Example:
LifeIncomePopulation <-
inner_join(LifeIncomeRegion, population_fixed, by = c(“country”,
“year”))
7.) Save the new data set as a csv to use to
create figures.
Example:
write.csv(LifeIncomePopulation,
“/Users/kyleknox/Documents/M.S. Applied Statistics/SPRING
2024/STA533/week5/LifeIncomePopulation.csv”, row.names =
FALSE)
To see the entire DataCleanup.R file, click
HERE
Association between Income and Life Expectancy in
2000: A Story
A Look Inside the Plot
The scatter plot visualizes the relationship between income per person
and life expectancy across the world in the year 2000. Each point’s size
represents the countries population size while the color is associated
to its region.
The correlation between income and life
expectancy is strong with a positive correlation. Countries with higher
incomes tend to have longer life expectancy. This illustrates the impact
economic wealth on health outcomes, this is likely due to better access
to healthcare, food resources, and living conditions.
The different colors signifying the different regions highlight the
geographical disparities in income and life expectancy. The European
countries, in green, tend to have higher incomes and life expectancy
than countries in Africa, in teal.
The varying populations
show that some larger countries, like those in the Asia region, have a
wide range of income and life expectancy. Although, some countries with
large population sizes do not always have the highest incomes,
indicating that bigger countries do not necessarily mean they have
higher economic wealth.
At the lower end of the income scale
shows a dense cluster of points with a steep gradient in life
expectancy. Signifying that even small increases in income within this
cluster can potentially lead to significant increases in life
expectancy.
LS0tCnRpdGxlOiAiSW5jb21lICYgTGlmZSBFeHBlY3RhbmN5IEFjcm9zcyBUaGUgV29ybGQiCmF1dGhvcjogIkt5bGUgS25veCIKb3V0cHV0OgogIGh0bWxfZG9jdW1lbnQ6CiAgICB0b2M6IG5vCiAgICB0b2NfZmxvYXQ6IG5vCiAgICBudW1iZXJfc2VjdGlvbnM6IHllcwogICAgdG9jX2NvbGxhcHNlZDogbm8KICAgIGNvZGVfZm9sZGluZzogaGlkZQogICAgY29kZV9kb3dubG9hZDogeWVzCiAgICBzbW9vdGhfc2Nyb2xsOiB5ZXMKICAgIHRoZW1lOiBsdW1lbgotLS0KCmBgYHtyIGxpYnMsIGluY2x1ZGU9RkFMU0V9Cm9wdGlvbnMoaHRtbHRvb2xzLmRpci52ZXJzaW9uID0gRkFMU0UpCmxpYnJhcnkoanBlZykKbGlicmFyeShodHRyKQpsaWJyYXJ5KHNjYWxlcykKbGlicmFyeSh4YXJpbmdhbnRoZW1lcikKbGlicmFyeSh0aWR5dmVyc2UpCmxpYnJhcnkoZ2dwbG90MikKCm9wdGlvbnMoaHRtbHRvb2xzLmRpci52ZXJzaW9uID0gRkFMU0UpCgpzdHlsZV9kdW9fYWNjZW50KHByaW1hcnlfY29sb3IgPSAiIzM5NzI1NSIsCiAgICAgICAgICAgICAgICAgc2Vjb25kYXJ5X2NvbG9yID0gIiNGRkZGRkYiLAogICAgICAgICAgICAgICAgIGhlYWRlcl9mb250X2dvb2dsZSA9IGdvb2dsZV9mb250KCJNYXJ0ZWwiKSwKICAgICAgICAgICAgICAgICB0ZXh0X2ZvbnRfZ29vZ2xlID0gZ29vZ2xlX2ZvbnQoIkxhdG8iKSwKICAgICAgICAgICAgICAgICBjb2RlX2ZvbnRfZ29vZ2xlID0gZ29vZ2xlX2ZvbnQoIkZpcmEgTW9ubyIpKQoKa25pdHI6Om9wdHNfY2h1bmskc2V0KAogIGZpZy53aWR0aD0zLCAKICBmaWcuaGVpZ2h0PTMsIAogIGZpZy5yZXRpbmE9MTIsCiAgb3V0LndpZHRoID0gIjEwMCUiLAogIGNhY2hlID0gRkFMU0UsCiAgZWNobyA9IFRSVUUsCiAgbWVzc2FnZSA9IEZBTFNFLCAKICB3YXJuaW5nID0gRkFMU0UsCiAgaGlsaW5lID0gVFJVRQopCmBgYAojIyA8c3BhbiBzdHlsZT0iY29sb3I6IHJlZDsiPkNsZWFuaW5nIHRoZSBEYXRhIGluIFI8L3NwYW4+Cgo8ZGl2IHN0eWxlPSJ3aWR0aDogOTAlOyBtYXJnaW46IDAgYXV0bzsgdGV4dC1hbGlnbjoganVzdGlmeTsgI2NjYzsgcGFkZGluZzogMjBweDsgYm94LXNpemluZzogYm9yZGVyLWJveDsiPgogIDxwIHN0eWxlPSJ0ZXh0LWFsaWduOiBqdXN0aWZ5OyI+CiAgMS4pIFJlYWQgaW4gYWxsIGZvdXIgZGF0YSBzZXRzIGludG8gUi4gPGJyPgogIDxicj4KICAgICAgRXhhbXBsZTo8YnI+CiAgICAgIDxzcGFuIHN0eWxlPSJjb2xvcjogYmx1ZTsiPgogICAgICBpbmNvbWUgPC0gcmVhZC5jc3YoIi9Vc2Vycy9reWxla25veC9Eb2N1bWVudHMvTS5TLiBBcHBsaWVkICAgICAgICAgU3RhdGlzdGljcy9TUFJJTkcgMjAyNC9TVEE1MzMvd2VlazUvaW5jb21lX3Blcl9wZXJzb24uY3N2IiwgY2hlY2submFtZXMgPSBGQUxTRSkKICAgICAgPC9zcGFuPjxicj4KICAgICAgPGJyPgogIDIuKSBDcmVhdGUgbmV3IGRhdGFzZXRzIHRoYXQgYXJlIHRyYW5zcG9zZWQgYW5kIG9yaWVudGVkIGNvcnJlY3RseS4gPGJyPgogIDxicj4KICAgICAgICBFeGFtcGxlOiA8YnI+CiAgICAgICAgPHNwYW4gc3R5bGU9ImNvbG9yOiBibHVlOyI+CiAgICAgICAgaW5jb21lcHAgPC0gaW5jb21lICU+JSA8YnI+CiAgICAgICAgIGdhdGhlcihrZXkgPSAieWVhciIsIHZhbHVlID0gImluY29tZSIsIC1nZW8pICU+JSA8YnI+CiAgICAgICAgIHJlbmFtZShjb3VudHJ5ID0gZ2VvKSA8YnI+PC9zcGFuPgogICAgICAgICA8YnI+CiAgMy4pIENyZWF0ZSBhIG5ldyBkYXRhIHNldCB1c2luZyBhbiBpbm5lciBqb2luIG1lcmdpbmcgdGhlIGluY29tZV9wZXJfcGVyc29uLmNzdiB3aXRoIHRoZSBsaWZlX2V4cGVjdGFuY3lfeWVhcnMuY3N2IGRhdGFzZXQuIDxicj4KICA8YnI+CiAgICAgIEV4YW1wbGU6IDxicj4KICAgICAgPHNwYW4gc3R5bGU9ImNvbG9yOiBibHVlOyI+CiAgICAgIExpZmVFeHBJbmNvbSA8LSBpbm5lcl9qb2luKGluY29tZXBwLCBsaWZleXIsIGJ5ID0gYygiY291bnRyeSIsICJ5ZWFyIikpPGJyPjwvc3Bhbj4KICAgICAgPGJyPgogIDQuKVJlbmFtZSBhbGwgdmFyaWFibGVzIGluIHRoZSBjb3VudHJpZXNfdG90YWwuY3N2IGFuZCBwb3B1bGF0aW9uX3RvdGFsIHRvIG1hdGNoIHRoZSB2YXJpYWJsZXMgaW4gTGlmZUV4cEluY29tZS4gPGJyPgogIDxicj4KICAgICAgRXhhbXBsZTogPGJyPgogICAgICA8c3BhbiBzdHlsZT0iY29sb3I6IGJsdWU7Ij4KY291bnRyeV95ZWFyIDwtIGNvdW50cnlSICU+JQogIHJlbmFtZShjb3VudHJ5ID0gbmFtZSkKY291bnRyeV9yZWdpb24gPC0gY291bnRyeV95ZWFyICU+JQogIHNlbGVjdChjb3VudHJ5LCByZWdpb24pPGJyPjwvc3Bhbj4KICA8YnI+CiAgNS4pIENyZWF0ZSBhIG5ldyBkYXRhIHNldCB1c2luZyBhbiBpbm5lciBqb2luIG1lcmdpbmcgdGhlIG5ldyBkYXRhIHNldCBMaWZlRXhwSW5jb21lIHdpdGggdGhlIGNvdW50cnlfeWVhci5jc3YgZGF0YXNldC4gPGJyPgogIDxicj4KICAgICAgRXhhbXBsZTogPGJyPgogICAgICA8c3BhbiBzdHlsZT0iY29sb3I6IGJsdWU7Ij4KICAgICAgTGlmZUluY29tZVJlZ2lvbiA8LSBpbm5lcl9qb2luKExpZmVFeHBJbmNvbSwgY291bnRyeV9yZWdpb24sIGJ5ID0gImNvdW50cnkiKTxicj48L3NwYW4+CiAgICAgIDxicj4KICA2LikgQ3JlYXRlIGEgZmluYWwgZGF0YSBzZXQgdXNpbmcgYW4gaW5uZXIgam9pbiBtZXJnaW5nIHRoZSBuZXcgZGF0YSBzZXQgTGlmZUluY29tZVJlZ2lvbiB3aXRoIHRoZSBuZXcgcG9wdWxhdGlvbiBkYXRhc2V0LiA8YnI+CiAgPGJyPgogICAgICBFeGFtcGxlOiA8YnI+CiAgICAgIDxzcGFuIHN0eWxlPSJjb2xvcjogYmx1ZTsiPgogICAgICBMaWZlSW5jb21lUG9wdWxhdGlvbiA8LSBpbm5lcl9qb2luKExpZmVJbmNvbWVSZWdpb24sIHBvcHVsYXRpb25fZml4ZWQsIGJ5ID0gYygiY291bnRyeSIsICJ5ZWFyIikpPGJyPjwvc3Bhbj4KICAgICAgPGJyPgogIDcuKSBTYXZlIHRoZSBuZXcgZGF0YSBzZXQgYXMgYSBjc3YgdG8gdXNlIHRvIGNyZWF0ZSBmaWd1cmVzLiAKICA8YnI+CiAgRXhhbXBsZTogPGJyPgogICAgICA8c3BhbiBzdHlsZT0iY29sb3I6IGJsdWU7Ij53cml0ZS5jc3YoTGlmZUluY29tZVBvcHVsYXRpb24sICIvVXNlcnMva3lsZWtub3gvRG9jdW1lbnRzL00uUy4gQXBwbGllZCBTdGF0aXN0aWNzL1NQUklORyAyMDI0L1NUQTUzMy93ZWVrNS9MaWZlSW5jb21lUG9wdWxhdGlvbi5jc3YiLCByb3cubmFtZXMgPSBGQUxTRSk8YnI+PC9zcGFuPgogICAgICA8YnI+CiAgICAgIFRvIHNlZSB0aGUgZW50aXJlIERhdGFDbGVhbnVwLlIgZmlsZSwgY2xpY2sgIDxhIGhyZWY9J2h0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9reWxla25veDMvU1RBNTUzL21haW4vd2VlazUvRGF0YUNsZWFudXAuUic+SEVSRTwvYT4KIDwvcD4KPC9kaXY+CgojIyA8c3BhbiBzdHlsZT0iY29sb3I6IHJlZDsiPkFzc29jaWF0aW9uIGJldHdlZW4gSW5jb21lIGFuZCBMaWZlIEV4cGVjdGFuY3kgaW4gMjAwMDogQSBTdG9yeTwvc3Bhbj4KYGBge3IgY29kZSwgZmlnLndpZHRoPTgsIGZpZy5oZWlnaHQ9NC44LCBlY2hvPUZBTFNFfQp1cmwgPC0gJ2h0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9reWxla25veDMvU1RBNTUzL21haW4vd2VlazUvZGF0YTIwMDAuY3N2JwpkYXRhIDwtIHJlYWQuY3N2KHVybCkKZGF0YTIwMDAgPC0gbmEub21pdChkYXRhKQoKcmVnaW9ucyA8LSBjKCJBc2lhIiA9ICIjMzMyMjg4IiwgIkV1cm9wZSIgPSAiIzExNzczMyIsICJBZnJpY2EiID0gIiM0NEFBOTkiLCAiQW1lcmljYXMiID0gIiM4OENDRUUiLCAiT2NlYW5pYSIgPSAiI0FBNDQ5OSIpCgoKZ2dwbG90KGRhdGEyMDAwLCBhZXMoeCA9IGluY29tZSwgeSA9IGxpZmVFeHAsIAogICAgICAgICAgICAgICAgICAgICBjb2xvciA9IHJlZ2lvbiwgCiAgICAgICAgICAgICAgICAgICAgIHNpemUgPSBwb3B1bGF0aW9uKSkgKwogIGdlb21fcG9pbnQoYWxwaGEgPSAwLjUpICsKICBzY2FsZV9zaXplKG5hbWUgPSAiUG9wdWxhdGlvbiBTaXplIikgKwogIHNjYWxlX2NvbG9yX21hbnVhbCh2YWx1ZXMgPSByZWdpb25zLCBuYW1lID0gIlJlZ2lvbiIpICsgIAogIGxhYnMoCiAgICB4ID0gIkluY29tZSBwZXIgUGVyc29uIiwKICAgIHkgPSAiTGlmZSBFeHBlY3RhbmN5IiwKICAgIHRpdGxlID0gIkFzc29jaWF0aW9uIGJldHdlZW4gSW5jb21lIGFuZCBMaWZlIEV4cGVjdGFuY3kgaW4gMjAwMCIsCiAgICBzdWJ0aXRsZSA9ICJQb2ludCBzaXplIHJlcHJlc2VudHMgcG9wdWxhdGlvbiBzaXplIiwKCiAgKSArCiAgdGhlbWVfbWluaW1hbCgpCgpgYGAKPGJyPgo8YnI+CjxkaXYgc3R5bGU9InRleHQtYWxpZ246IGNlbnRlcjsiPjxzcGFuIHN0eWxlPSJjb2xvcjogcmVkOyI+QSBMb29rIEluc2lkZSB0aGUgUGxvdDwvc3Bhbj48L2Rpdj4KCjxkaXYgc3R5bGU9IndpZHRoOiA5NSU7IG1hcmdpbjogMCBhdXRvOyB0ZXh0LWFsaWduOiBqdXN0aWZ5OyBib3JkZXI6IDFweCBzb2xpZCAjY2NjOyBwYWRkaW5nOiAxMHB4OyBib3gtc2l6aW5nOiBib3JkZXItYm94OyI+CiAgPHAgc3R5bGU9InRleHQtYWxpZ246IGp1c3RpZnk7Ij4KVGhlIHNjYXR0ZXIgcGxvdCB2aXN1YWxpemVzIHRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiBpbmNvbWUgcGVyIHBlcnNvbiBhbmQgbGlmZSBleHBlY3RhbmN5IGFjcm9zcyB0aGUgd29ybGQgaW4gdGhlIHllYXIgMjAwMC4gRWFjaCBwb2ludCdzIHNpemUgcmVwcmVzZW50cyB0aGUgY291bnRyaWVzIHBvcHVsYXRpb24gc2l6ZSB3aGlsZSB0aGUgY29sb3IgaXMgYXNzb2NpYXRlZCB0byBpdHMgcmVnaW9uLiA8YnI+Cjxicj4KVGhlIGNvcnJlbGF0aW9uIGJldHdlZW4gaW5jb21lIGFuZCBsaWZlIGV4cGVjdGFuY3kgaXMgc3Ryb25nIHdpdGggYSBwb3NpdGl2ZSBjb3JyZWxhdGlvbi4gIENvdW50cmllcyB3aXRoIGhpZ2hlciBpbmNvbWVzIHRlbmQgdG8gaGF2ZSBsb25nZXIgbGlmZSBleHBlY3RhbmN5LiAgVGhpcyBpbGx1c3RyYXRlcyB0aGUgaW1wYWN0IGVjb25vbWljIHdlYWx0aCBvbiBoZWFsdGggb3V0Y29tZXMsIHRoaXMgaXMgbGlrZWx5IGR1ZSB0byBiZXR0ZXIgYWNjZXNzIHRvIGhlYWx0aGNhcmUsIGZvb2QgcmVzb3VyY2VzLCBhbmQgbGl2aW5nIGNvbmRpdGlvbnMuICAKPGJyPgpUaGUgZGlmZmVyZW50IGNvbG9ycyBzaWduaWZ5aW5nIHRoZSBkaWZmZXJlbnQgcmVnaW9ucyBoaWdobGlnaHQgdGhlIGdlb2dyYXBoaWNhbCBkaXNwYXJpdGllcyBpbiBpbmNvbWUgYW5kIGxpZmUgZXhwZWN0YW5jeS4gVGhlIEV1cm9wZWFuIGNvdW50cmllcywgaW4gZ3JlZW4sIHRlbmQgdG8gaGF2ZSBoaWdoZXIgaW5jb21lcyBhbmQgbGlmZSBleHBlY3RhbmN5IHRoYW4gY291bnRyaWVzIGluIEFmcmljYSwgaW4gdGVhbC4gPGJyPgo8YnI+ClRoZSB2YXJ5aW5nIHBvcHVsYXRpb25zIHNob3cgdGhhdCBzb21lIGxhcmdlciBjb3VudHJpZXMsIGxpa2UgdGhvc2UgaW4gdGhlIEFzaWEgcmVnaW9uLCBoYXZlIGEgd2lkZSByYW5nZSBvZiBpbmNvbWUgYW5kIGxpZmUgZXhwZWN0YW5jeS4gQWx0aG91Z2gsIHNvbWUgY291bnRyaWVzIHdpdGggbGFyZ2UgcG9wdWxhdGlvbiBzaXplcyBkbyBub3QgYWx3YXlzIGhhdmUgdGhlIGhpZ2hlc3QgaW5jb21lcywgaW5kaWNhdGluZyB0aGF0IGJpZ2dlciBjb3VudHJpZXMgZG8gbm90IG5lY2Vzc2FyaWx5IG1lYW4gdGhleSBoYXZlIGhpZ2hlciBlY29ub21pYyB3ZWFsdGguIDxicj4KPGJyPgpBdCB0aGUgbG93ZXIgZW5kIG9mIHRoZSBpbmNvbWUgc2NhbGUgc2hvd3MgYSBkZW5zZSBjbHVzdGVyIG9mIHBvaW50cyB3aXRoIGEgc3RlZXAgZ3JhZGllbnQgaW4gbGlmZSBleHBlY3RhbmN5LiAgU2lnbmlmeWluZyB0aGF0IGV2ZW4gc21hbGwgaW5jcmVhc2VzIGluIGluY29tZSB3aXRoaW4gdGhpcyBjbHVzdGVyIGNhbiBwb3RlbnRpYWxseSBsZWFkIHRvIHNpZ25pZmljYW50IGluY3JlYXNlcyBpbiBsaWZlIGV4cGVjdGFuY3kuPGJyPgogPC9wPgo8L2Rpdj4KPGJyPgo8YnI+CgogIAo=